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The gene encoding the spike glycoprotein (S) of bovine 
enteric coronavirus (BECV) was cloned and its com- 
plete sequence of 4092 nucleotides was determined. 
This sequence contained a single long open reading 
frame with a coding capacity of 1363 amino acids (M, 
150747). The predicted protein had 19 N-glycosylation 
sites. A signal sequence comprising 17 amino acids was 
observed starting from the first methionine residue. A 
potential peptidase cleavage site was located between 
amino acids 763 and 767. These cleavages explain the 
maturation of the primary product of the S gene 


Bovine enteric coronavirus (BECV) was first identified 
by electron microscopy in faecal samples of calves 
suffering from acute enteritis (Mebus e¢ a/., 1973a). The 
involvement of BECV in the aetiology of diarrhoeal 
diseases has been suggested in several studies (Mebus et 
al., 19736; Bridger et al., 1978; Gouet et al., 1978). 
During the acute stage of infection, virus particles are 
excreted in large amounts and have been identified 
within brush border cells of the small intestine and in 
differentiated colonic epithelial cells. Although propaga- 
tion of BECV is difficult in conventional cell lines 
(Mebus ef a/., 1973a), it has been grown successfully in 
HRT18 cells (human rectal tumour cell line; Laporte et 
al., 1980). 

As a member of the Coronaviridae family BECV is a 
pleiomorphic, enveloped spherical particle (120 nm in 
diameter) surrounded by a fringe of 20 nm long club- 
shaped spikes. The coronavirus genome 1s a positive and 
single-stranded capped RNA with a polyadenylated 3’ 
end (Siddell et al., 1982; Sturman & Holmes, 1983). The 
structural and non-structural proteins of the virus are 
translated from a 3’-coterminal nested set of mRNAs, 
each having a common 5’ leader sequence (Lai et al., 
1984). Only the unique 5’-terminal sequence is translat- 
ed; this sequence is absent from the next smaller RNA of 
the set. 
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to S1 (M, 104692) and S2 (M, 84175) spike structural 
proteins. Two amphipathic «-helices (amino acids 1007 
to 1077 and 1269 to 1294) which may constitute the 
12 nm staik of the viral spike were also observed; 
another «-helix (amino acids 1305 to 1335) may be 
involved in the anchorage of the spike in the viral 
membrane. Comparison of this protein sequence to the 
described homologous mouse hepatitis (MHV) strain 
A59 and MHV-JHM S protein sequences led us to 
suggest that MHV-A59 and MHV-JHM S genes could 
be derived from a deletion of the BECV S gene. 


BECV possesses five main structural proteins, i.e. a 
phosphorylated nucleocapsid protein (N; 50K), a trans- 
membrane matrix glycoprotein (M; 28K) and three 
peplomer glycoproteins [S1, S2 (S; spike) and haemag- 
glutinin (HA)]. Glycoproteins $1 (105K) and S2 (95K) 
are the cleavage products of the S gene-encoded primary 
product (180K; J. F. Vautherot, unpublished results; 
Deregt & Babiuk, 1987). HA (M, 125K) is split by 
reducing agents into two subunits of equal size with an 
M, of 65K (Laporte & Bobulesco, 1981; King & Brian, 
1982); the neutralizing epitopes are located on S1 and 
HA (Vautherot et a/., 1984). 

The cloning and sequencing of the gene encoding the S 
protein of BECV, a necessary first step for the 
production of a genetically engineered vaccine against 
BECYV, is reported in this paper. 

BECV strain F15 (BECV-F15) was grown in HRT18 
cells; virus and genomic RNA purifications were 
performed as described previously (Cruciére & Laporte, 
1988). The virus genome was used as a template for 
cDNA synthesis. Poly(dC)-tailed RNA-cDNA hetero- 
duplexes were inserted into a dG-tailed PstI-linearized 
pBR322 plasmid. Complementary DNAs were then 
cloned in Escherichia coli RR1; tetracycline-resistant, 
ampicillin-susceptible colonies were then transferred 
onto nitrocellulose filters and lysed in situ (Maniatis et al., 
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Fig. 1.(a) Schematic diagram of part of the BECV genome and location of cDNA clones. A simplified restriction map is 
given; the inserts used as probes to screen the cDNA library are shaded grey; the sequence of the S gene was obtained 
with the clones that are boxed in the figure. (6) Alignment and comparison of amino acid sequences of MHV-A59, MHV- 
JHM and BECV'S propolypeptides. The percentage homology is given; arrows indicate putative peptidase cleavage sites. 


1982). Their DNAs were hybridized overnight at 42 °C, 
in hybridization buffer (5x SSC, 50% formamide, 
5 x Denhardt’s solution and 100 pg/ml of sonicated calf 
thymus DNA) containing a 3*P-labelled random primed 
viral insert CDNA (Feinberg & Vogelstein, 1983). 

After washing the filters three times in 0-1 x SSC, 
0-1% SDS at 60°C for 20 min, they were dried and 
autoradiographed on X-ray films (Fuji) at —70°C for 
12 h. Mini-lysates were obtained from the positive clones 
(Birnboim & Doly, 1979). After digestion with PstI, the 
selected inserts were analysed on Southern blots (Mania- 
tis et al., 1982). Northern blots were used for the 
localization of inserts. Total RNA from BECV-infected 
HRT18 cells was extracted by the guanidinium isothio- 
cyanate method according to Vaquero et al. (1982) and 
electrophoresed in a denaturing 1% agarose gel (6% 
formaldehyde). Transfer, hybridization and washing 
were performed as described for Southern blot 
experiments. 

The primer used to obtain viral cDNA corresponded 
to the BamHI cleavage site (GGATCC) which was found 
at the beginning of our study on a cDNA clone at the 5’ 
end of the gene encoding the N protein. (In fact this 
restriction site was not located at that place but was 
functioning randomly.) The primer yielded 3500 cDNA 
clones. 


By Northern blot analysis (data not shown), we 
established that the large cDNA insert G7 (2:4 kb) 
covered a part of the S gene (Fig. 1); sequence analysis of 
that clone and comparison with the sequence of the gene 
coding for the E2 protein of coronavirus mouse hepatitis 
virus strain JHM (MHV-JHM) (Schmidt et al., 1987) 
showed that this insert mapped in the middle of the S 
gene. G7 cDNA was used to screen the cDNA library 
and to obtain clones. Clones P G7 8 and P 27 40 were 
used also as probes to identify cDNA clones located at 
the 5’ and the 3’ ends of the S gene, respectively (Fig. 1). 

Both DNA strands were sequenced on five overlap- 
ping cDNA clones: P G7, P G78, P G78 12, P 27 40 and 
P 33 23 (Fig. 1a). After it had been established by 
restriction mapping and Southern blot analysis that these 
clones covered the whole length of the S gene, M13 
dideoxynucleotide sequencing was carried out according 
to Sanger et al. (1977) by using sonicated cloned 
fragments subcloned into the SmaI site of the M13 mp10 
vector (Deininger, 1983). Buffer gradient gels and [a- 
35S|dATP were used according to Biggin et al. (1983). 

Sequence data were analysed and assembled with the 
aid of the program of Queen & Korn (1984), the 
Beckman Microgenie program (March 1985 version, 
Beckman Instruments) adapted for the IBM PC-XT 
microcomputer. The nucleotide sequence obtained 


(Fig. 2) contains a single long open reading frame 
(ORF) in the mRNA sense which extends from the first 
ATG codon (nucleotides 31 to 33) to nucleotide 4122. 
This 4092 nucleotide sequence contains 28% A, 16-2%C, 
19-7% G and 36-2% T, and has a coding capacity for 1363 
amino acids (Fig.2) with an M, of 150747 and a 
proposed pHI of 5-6. 

The length of the ORF is in the expected size range for 
the glycoprotein S gene sequence. Furthermore, com- 
parison with the published S gene sequence of corona- 
virus MHV-AS9 (Luytjes et al., 1987) shows homology 
which increases from 61% at the 5’ end of the ORF 
(BECV nucleotides 31 to 1513) to 73-8% at the 3’ end 
(BECV nucleotides 2711 to 4100). 

Immediately upstream from the first initiation codon 
there is a sequence ATCTAAACAT very similar to the 
conserved intergenic sequences of BECV, MHV-JHM 
and MHV-AS9 (Lapps et al., 1987; Cruciére & Laporte, 
1988; Luytjes et al., 1987; Schmidt et al., 1987); it is also 
closely related to the conserved sequence AACTAAAC, 
reported for the transmissible gastroenteritis virus 
(TGEV) (Rasschaert & Laude, 1987). The sequence 
surrounding the translation initiation codon is in a sub- 
optimal environment (Kozak, 1987). A similar situation 
has been observed for the initiation codon of the S gene 
of MHV (Luytjes ez al., 1987). 

Comparison of the first 400 nucleotides of the PG7 8 
12 clone (P. Boireau & J. Laporte, unpublished 
observations) with the recently published HA-encoding 
gene sequence of bovine coronavirus (Parker et al., 1989) 
led us to the conclusion that the S gene is just downstream 
of the HA gene. 

The deduced amino acid sequence of the BECV S 
protein contains 19 potential N-glycosylation sites. 
Assuming a mean M, value of 2100 per carbohydrate 
chain (Hunter et al., 1983), the M, of the mature S 
glycoprotein would be approximately 190600. It appears 
to be hydrophobic over most of its length (35% 
hydrophobic amino acids). 

This protein shares some properties with S proteins 
described for other coronaviruses. After the first meth- 
ionine residue, there is a potential signal sequence with a 
hydrophobic core of 13 amino acids and a helix-breaking 
residue, glycine 17 (Watson, 1984). According to the rule 
established for a potential cleavage site (von Heijne, 
1984), a protease could act between amino acid residues 
17 and 18. The consequence would be that the first amino 
acid in the S protein is aspartic acid 18. 

Another potential peptidase cleavage site is located in 
the hydrophilic peak between residues 763 and 769. This 
sequence, Lys-Arg-Arg-Ser-Val-Arg, is collinear with the 
experimentally determined cleavage site of the MHV- 
A59 S protein (Luytjes et al., 1987), and very similar to 
the cleavage site of the infectious bronchitis virus (IBV) 
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S protein (Binns ef a/., 1985). Furthermore, this series of 
basic amino acids resembles the tryptic cleavage sites of 
peptidic prohormones or the FO protein of Newcastle 
disease virus (MacGinnes & Morrisson, 1986), which are 
processed in the trans-Golgi apparatus. The coronavirus 
S protein uses the same cellular metabolic pathway for 
maturation, and budding of the virus takes place in the 
Golgi apparatus and in the endoplasmic reticulum 
membrane. 

Tryptic cleavage would explain the maturation of the S 
protein of BECV: the primary gene product, P150 (i.e. 
the S polypeptide with an M, of 148967 without the 
signal peptide) is glycosylated in the endoplasmic 
reticulum and in the Golgi apparatus giving rise to a 
glycoprotein of 188867 (gp190), which is then cleaved 
into the S1 (104692) and S2 (84175) structural proteins. 
These results are in agreement with those published by 
Deregt & Babiuk (1987), using immunoelectrophoresis to 
study the biosynthesis of gp105 and gp95, and with our 
own observations (J. F. Vautherot et a/., unpublished 
results). 

Using the approach described by de Groot et al. (1987) 
for the S proteins of feline infectious peritonitis virus, 
MHV and IBV and by Rasschaert & Laude (1987) for the 
S protein of TGEV, we also demonstrated two amphi- 
pathic a-helices for the S protein of BECV; they are 
located between amino acids 1007 to 1077 and between 
amino acids 1269 to 1294. These a-helices may constitute 
the stalk of viral spikes, with a length of approximately 
12 nm. 

Using the method described by Rao & Argos (1986) a 
hydrophobic transmembrane «-helix was also predicted 
between amino acids 1305 and 1335. Its C-terminal 
location, the presence of a potential myristylation site on 
glycine 1333 and comparison with other coronaviruses 
suggest that this helix is involved in the anchorage of the 
spike in the viral membrane. The myristylation site, 
surrounded by eight cysteine residues, would be located 
at the internal face of the viral membrane. Among 
coronaviruses this domain is highly conserved in 
structure, location and length. A stretch of eight amino 
acids, Lys-Trp-Pro-Trp-Tyr-Val-Trp-Leu (1305 to 1312; 
Fig. 2), is found in all coronavirus S protein sequences 
established so far; however its role is unknown. 

Luytjes et al. (1987, 1988) have compared S amino acid 
sequences of the two MHV strains A59 and JHM. These 
sequences are very similar but the S protein of JHM was 
found to be shorter. As BECV belongs to the same 
antigenic group as these viruses, the present results 
enlarge upon this comparison. Dot matrix analyses 
(Beckman Microgenie program) of the deduced amino 
acid sequence of the S genes of BECV and MHV-A59 
(data not shown), revealed that there is low homology 
(55%) between amino acids 488 to 593 of BECV and 
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GCTGCATGATGCTIAGACCAT! TTTTIGATACTTITAATITCCTIACCAATGGCTCITGCTGTTIATAGGAGATTIAAAGTGTACTACGGTTTCCATTAATGATGTIGAC 120 
M FOL-I Lb LI Ss bLPMALAY IG DLR OT TV S TRO VD BD 
= 


ACCGGTGTICCTIC TATTAGCACTGATACTGTCGATGTTACTAATGG IT TAGG TACTIATIATGTI I TAGATCGTGIGTATTIAAATAC TACGCTGTTGCTTAATGGTIACTACCCTACT 140 
tT G¢VYPSTtTSTOTVYDY DTN GLE TtTY Y VYLOR VY LEUNTTLLELNG Y Y PT 


TCAGGITCTACATATCGTAATATGGCACTGAAGGGAACTTTACTIATIGAGCACACTATGGTTTAAACCACCTITTCTTICTGATTITATTAATGGTATITITGCTAAGGTCAAAAATACC 360 
$GSTYRNMALKGTULLLS tT LWP K PP PGS DF ENG IFA K VY KN OT LO 
AAGGTTATTAAACATGGTGTAATGTATAGTGAGTTTCCTGCTATAACTATAGG IAGTACTTTIGTAAATACATCC TA TAGTGTGGTAGTACAACCACATACTACCAATITAGATAATAAA 480 
K V IK HGYMYS EF PAIT?TIGS TFVYNTS ¥ S VY VY QP HT TN LDN K 150 


TTACAAGGTCTCTTAGAGATCTCTGTTTGCCAGTATACTATGTGCGAGTACCCAAATACGATTIGTCATCCTAATT TGGG TAATCGGCGCGTAGAACTATGGCATTGGGATACAGGTGTT 600 
LQOGguLbLuebtrerersveegrygt3tsMceyeNTI¢C¢H PNLGNRRV EL WHR WD T GC V_ 199 
GTTTCCTGITIATATAAGCGTAAC TTCACA TA TGATCTGAATGCTGATTATTTGTATTTCCATTTITATCAAGAAGGTGGIACTITITATGCATATTTTACAGATACTGGTGTITGTTACT 720 
vscuyrRRNF TY DVNAD YULY FHF Y QE GE TF YAY F TD TGV V TT 230 


ARAGTTICTGTTTAATGTTTIATT TAGGCACGGTGCTTTCACATTATTATG TCATGCCTTTGACTTGTAATAGTGCTATGACTITAGAATATTGGGTTACACCTCTCACTICTAAACAATAT 840 
RFULUFNVYUGTVLSHY Y VYuMPLTCNSAMTLE Y WV TPL TS K Q Y 27 


TTACTCGCTITCAATCAAGATGGTGTIATT TT TAATGCTGTIGATTG TAAGAG TGATTT TATGAGTGAGATTAAGTGTMAAACACTATCTATAGCACCATCTACTGGTGTITATGAATTA 60 
L LA FNQDBGvVvY IF NAV DCK S$ DF MS EIT KCK TLS IAP S TEV YY EL 3 
AACGGTTACACTGTTCAGCCAATIGCAGATGTTTACCGACGTATACCTARTCTTCCCGATTGTAATATAGAGGCTIGGCTTAATGATAAGICTGTGCCCTCTCCATTARATTGGGAACGT 1080 
NG ¥ TV QePIAD VY RRIP NL PDEN IT EAWLN DKS VPS PLN WE R350 
AAGACCTTTICAAATTGTAAT TTIAATATGAGCAGCCTGATGICTTTTATCCAGGCAGACTCATTTACTTGTAATAATATTGATGCTGCTAAGATATATGGTATGTGTTITTCCAGCATA {200 
kK TF S$ NC NFNMS SLMS F I QgaAaADS FTCN NT DAA KT ¥Y CM CF S S IT 39 


ACTATAGATAAGTTTGCTATACCCAATGGTAGGAAGG TT GACC TACAATTGGGCAATTTGGGCTATTTGCAGTCT TT TAACTATAGAATIGATACTACTGCTACAAGTIGTICAGTIGTAT 1320 
T £.09 K FAT PNGRKV.DLO@LG¢NLGY LQ@S FNY RIDTTAT SS CQL Y 430 
TATAATTTACCTGCTGCTAATGTTTCTCICASCAGGTITAATICTTICIATITS TAGGAGATITGGTTTTACAGAACAATCTGTTTTTAAGCCTCAACCTGIAGGIGTITTIACTGAT 1440 
YNLPAAN VS VS R FN PS ft WNRREF GF TEQS VF KF QP VY GV F TD 4% 


CATGATGTTGT TTATGCACAACATTGTTTTAAAGC TCCCACAAATTTCTGTCCGTGTAAATTGGATGGG CTT TGIGTGTAGGTAATGGTCCTGGTATAGATGCTGGTTATAAAAATAGT 1560 
HDVVYAQHCF KAP TNF CPEOK LOGS LOCOVGNGEPEGIDA GY K N S510 
GGTATAGGCACTTGTCCTGCAGGCACTAATTATTTAACT TGCCATAATGCTGCCCAATG TAATIGTT TG TGCACTCCAGACCCCATTACAT CTAAATCTACAGGGCCTTACAAGTGCCCC 1680 
¢I¢tc¢ PAGTNWYLtTCBRBAA QO CKHEOCLO TP DPI TS KS TGP Y K C P_ 550 
CAAACTAAA TACT TAGTTGGCATAGGTGAGCAC TGTTCGGGTCTTGCTATTAAAAGTGATTATIGTGGAGG TAATCCTTGTACTIGCCAACCACAAGCATITITGGGTTGGTICTGTITGAC 1800 
ark YULvceotiI¢eerexrcscuruaAtTxksopyrec¢cgnrerpeetrcerPQ@AF LG WS VD 5H 
TCTTGTTTACAAGGGGATAGGTGTAATATITTTGCTAATITIATTITGCATGATGTTAATAGTGG TACTACTTGTICTACTGATTTACAAAAATCARACACAGACATAATICTIGGTGTT 1920 
s ¢LQGopDRc¢cNIPANFILHODVNSGTTCeCS TDL KS NT D IT IL GC V_ 630 


TGTGITAATTATGATCTTTATGGTATTACTGGCCAAGGTATTTTTGTTGAGGC TAATGCGACTIATTATAATAGTIGGCASAACCTTTIATATGATTCTAATGGTAATCTCTATGGITTT 2040 
c VN YDLyYG#tHIs$k$TGEec¢IFVYEANA TY YN SWQNLLY DS NGNL ¥ GF 670 


AGAGACTACTTAACAAACAGAACTTTTATGATTCGTAGTIGCTATAGCGGTCCTGTTTCAGCEGCCTTTCATGCTAACTCTICCGAACCAGCACTGCTATITCGGHAZATIAAATGCAAT 2160 
RDYLTNRTRFMIRS CYS GCRVS AA FRA NS SE PAL LF RN IT K CN 79 


———o ——ee 
TACGTTITTAATAATACTCTTTCACGACAGCTGCAACC TATTAACTATITTGATAGCTATCTTIGGTIGIGITGTCAATGCTGATAATAGTACTGCTAGIGCTGITCARACATGTGATCTC 2280 
750 

YVFNNTLSRQLQPINY FDS Y¥YLGECVYVNADN S TA SAVY OT CD L 4S 


ACAGTAGGTAGTGGTTACTGTGTGGATTACTCTACAAAAAGACGAAGCG TAAGAGCGAT TACCACTGGTIATCGGTTTACTAATIT TGAGCCATITACTGTTAATICAGTAAATGATAGT 2400 
PVE S66°¥ COV DX 8 TOK RRS VM ORQA TT TG YOR FOTN POR PR PP Es ye 


et 
TTAGAACCTGTAGGTGGTTTGTATGAAATTCAAATACCTTCAGAATTTACTATAGGTAATATGGAGGAGTITATTCAAACAAGCTCTCCIAAAGTIACTATIGATIGITCTGCTITIGTC 2520 
LEevGcgGcouLurbyréeBEtraQi3i3iepsEeEFPTIGNMEEFIQTS$ S&S PK VY TI BE S AF V 830 


TGTGGTGATTGTGCAGCATGTAAATCACAGTTGGTIGAATATGGTAGTITCTGIGACAATATTAATGCTATACTCACAGARGTARATGAACTACTTGACACTACACAGTTGCAAGTAGCT 2640 
ccDpDCcCAACKSQULVEYGSF CDN TNATLTEVN EULLOTtT tT Ak Qe Vv A 8m 


AATAGTTTAATGAATGGTGTCACTCTTAGCACTAAGCT TAAAGATGGCGTTAATTICAA TGTAGACGACATCAATTTTTCCCCTGTATTAGGT TGTTTAGGAAGCGAATGTAATAAAGTT 2760 
NsLM@NGVt?TLS TK UK DGY NFR VY DDINF Ss PV LGeL Ee S$ EC NH K V 910 
—— 
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TCCAGTAGATCTGCTATAGAGGATTTACTTTTTTCTAAAGTAAAGTTATCTGATGTCGGTTTCGTIGAGGCTTATAATAATIGTACTGGGGG TGCCGAAATCAGGGACCTCATTTGCGTG 
§ SRS ATEODLULF § KV KLS& DV GFVEA Y NNCTGGAETRODVDUIC Vv 


—— ee 


CAAAGTIATAATGGTATCAAAGTGTTGCCTCCACTGCTCTCAGAAAATCAGATCAGTGGATACACTTTGGCT GCTACCTICTGCTAGTCTGTITCCTCCTTGGTCAGCAGCAGCAGGTGTA 3 


as ¥N GIK VbLPPLLS ENQISs GY TLAAT S AS LF P PWS AAA G V 


CCATTITATTIAAATGTTCAGTATCGTATTARTGGGAT TGGTGTTACCATGGATGTGCT AAGTCAAAR TCAAAAGCTIATIGCTAATGCATTT AACAATGCTCTTGATGCTATTICAGGAA 
PF YOLCNvQYRINGIGvtfTfMODVLEsaQqxnreektgmriaAaAnN AF NNALODATI QE 


GGGTTTGATGCTACCAAT TCTGCT TTAGTTAAAATTCAAGCTGTTGTTAATGCAAATGCTGAAGCTCTIAATAACTTAT TGCAACAACTCTCTAATAGAT TTGGTGCTATAAGTICTICT 
G F DATNS ALY K IQAvVVN ANA EALNNLLEeO@ Q@Ls NRF GATTI S § § 


TTACAAGAAATTCTATCCAGACTT GATGCTCTTIGAAGC GCAACGTCAGATAGACAGACT TATTAATGGGCGT TICACCGCTCTTAATGCTIATGTTTCTCAACAGCTTAGTGATTCTACA 
LQeEerLS RL DALE A QROITOR LIN GR F TALWNAY VS ORQ@LS DS TF 


CTAGTARAATT TAGTGCAGCACAAGCTATGGAGAAGGT TAATGAATGTGTCAAAAGCCAATCATCTAGGATABATITTTGTGGTAATGGTAATCATATTATATCATTAGTGCAGAATGCT 
L VK F S$ A A QAM E KF V NEC VK $& g © € ® J NF CGNGNHHTI IS LVAaNA 
CCATATGGTTTGTATTTTATCCATTTTAGCTATGTCCCTACTAAGTATGTCACCGCGARAGGT TAG TCCCGGTCTGIGCATTGCTGGTGATAGAGGTATAGCCCCTAAGAGTGGTTATITT 
P Y¥ GLY F IHF S$ Y V PT K Y VT AK OV S PGLeIA GCG DRGTIAP KS GY F 
GTTAATGTAAATAACACT TGGATGTTCACTGGTAGTGGTTATTACTACCCTGAACCTAT AACTGGAAATAATGTTGTTIGTTATGAGTACCTGTGCTGTTAATTACACTAAAGSCGCCGGAT 
vVoN V N NT WM FT G S& G Y yY Y P EP LT TGNNVVVM S TC A VY NY T K A PD 

—_ee —————— 
GTAATGCTGAACATT TCAACACCCAACCTCCCTGATTT TAAGGAAGAGT TGGATCAATGGTTTAAAAACCAAACATCAGTGGCACCAGATTTGTCACTTGATTATATAAATGTTACATTC 
VMLRMESTPNLPDFKERELDOWFKEQOTSVAPDLSLDIYIRVITE 
TIGGACCTACAAGATGAAATGAATAGGTTACAGGAGGC AATAAAACTTT TAAATCAGAGCTACATCAATCTCAAGGACATIGGTACATATGAGTATTATGTAAAATGGCCTTGGTATGTA 

LDLO@ODEMNRLQEATKLLNOSYINELKDIGTYEYYV Poy oY 
@ eas: wSdcetieced 


TGGCTITTAATTGGCTTTGCTGGTGTAGCTATGCTTGT TTTACTATTCTTICATATGCTGTIGTACAGGATGT GGGACTAGTTGTTT TAAGAAATGTGGTGGTIGTTGTGATGATTATACT 


LLIG AG VAM Lv ©O 1: Oven 5 Or« «Os s@©Oror? 


Ww F Le iG. UE cE © G 
@eeeeeeeceeees eee eeeeeeeeeeeceaocaesece 
GGACACCAGGAGTTAGTAATTAAAACATCACATGACGACTAAGTTCGTCTTTGATTITAT TGGCTCCTGACGATATATTACATCCCTTCAATCATGTTAAGTTAATTATTATAAGCCCATT 

GHQeELVY IK T S$ HD D 
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Fig. 2. Nucleotide sequence of the gene encoding the S protein and predicted amino acid sequence of the S protein. The 
sequence deduced from the cDNA inserts (Fig. 1a) is shown as positive sense DNA from 5’ to 3’ ends; the hatched bar 
indicates the proposed N-terminal signal sequence. An arrowhead indicates the potential tryptic cleavage site. A box 
surrounds the conserved intergenic sequence. Potential N-glycosylation sites are underlined (specific for the Asn-X- 
Ser/Thr sequence, with X different from Pro). Dots mark the proposed C-terminal membrane anchoring domain. The last 
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eight cysteines are encircled. 


amino acids 481 to 545 of MHV-AS59 and apparently 
there is a deletion in the MHV-AS9 S protein sequence. 
More details emerged after alignment comparison of the 
two sequences (Fig. 1b). Amino acids 488 to 534 of 
BECV have no counterpart in MHV-AS59. Furthermore, 
amino acids 452 to 593 of the S protein of BECV have no 
counterpart in MHV-JHM (Fig. 1). 

Luytjes et al. (1987, 1988) put forward two hypotheses 
to account for the difference between MHV-A59 and 
MHV-JHM in this domain of the S protein. The first 
hypothesis is that the MHV-JHM genome is deleted with 
respect to a nucleotide sequence corresponding to amino 
acids 453 to 545 of the S protein of MHV-A59; the 
second is that the MHV-A59 genome has acquired 
genomic material by non-homologous recombination. 
The comparisons presented above support the idea of a 
genetic instability in this area of the virus genome which 
would explain the difference in length of the S proteins of 


the three virus strains. Therefore we suggest an evolu- 
tionary progression from MHV-JHM to MHV-AS59 to 
BECV or in reverse depending upon whether non- 
homologous recombination events or deletions have 
occurred. The observation that the nucleotide sequence 
of the S gene encoding amino acids 470 to 480 (data not 
shown) is highly homologous (81%) between BECV and 
MHV-A59 and does not exist in MHV-JHM leads us to 
propose the occurrence of deletions in this genome 
segment. Our suggestion is supported by the identifica- 
tion of a functional HA gene in BECV (Parker et ai., 
1989) whereas only a related HA pseudogene was 
identified in MHV-A59 (Luytjes et a/., 1988); this HA is 
most likely to occur if the BECV genome is more closely 
related to a possible common ancestor. 


We would like to thank Jean Francois Vautherot, Denis Rasschaert 
and Michel Brémont for helpful discussions. 


492 Short communication 


References 


Bicain, M. D., Gipson, T. J. & Hons, G. F. (1983). Buffer gradient 
gels and 75S label as an aid to rapid DNA sequence determination. 
Proceedings of the National Academy of Sciences, U.S.A. 80, 3963- 
3965. 

Binns, M. M., BOURSNELL, M. E. G., CAVANAGH, D., Pappin, D. J.C. 
& Brown, T. D. K. (1985). Cloning and sequencing of the gene 
encoding the spike protein of the coronavirus IBV. Journal of General 
Virology 66, 719-726. 

BirNBoIM, H. C. & Doty, J. (1979). A rapid alkaline extraction 
procedure for screening recombinant plasmid DNA. Nucleic Acids 
Research 7, 1513-1523. 

BRIDGER, J. C., WoopE, G. N. & MEYLING, A. (1978). Isolation of 
coronaviruses from neonatal calf diarrhoea in Great Britain and 
Denmark. Veterinary Microbiology 3, 101-103. 

CRUCIERE, C. & Laporte, J. (1988). Sequence and analysis of bovine 
enteric coronavirus (F15) genome I. Sequence of the gene coding for 
the nucleocapsid protein; analysis of the predicted protein, Annales 
de l'Institut Pasteur 139, 123-138. 

DE Groot, R. J., LENsTRA, J. A., LuytsEs, W., NIESTERS, H. G. M., 
HORZINEK, M. C., VAN DER ZEUST, B. A. M. & SPAAN, W. J. M. 
(1987). Sequence and structure of the coronavirus peplomer protein. 
In Biochemistry and Biology of Coronaviruses, pp. 31-38. Edited by 
M. M. C. Lai & S. A. Stohlman. New York: Plenum Press. 

DEININGER, P. L. (1983). Random subcloning of sonicated DNA: 
application to shotgun DNA sequence analysis. Analytical Biochem- 
istry 129, 216-223. 

DereEGT, D. & BABIUK, L. A. (1987). Monoclonal antibodies to bovine 
coronavirus: characteristics and topographical mapping of neutral- 
izing epitopes on the E2 and E3 glycoproteins. Virology 161, 410-420. 

FEINBERG, A. P. & VOGELSTEIN, B. (1983). A technique for 
radiolabeling DNA restriction endonuclease fragments to high 
specific activity. Analytical Biochemistry 132, 6-13. 

GOUET, PH., CONTREPOIS, M., DUBOURGUIER, H., RIOU, Y., SCHERRER, 
R., Laporte, J., VAUTHEROT, J. F., COHEN, J. & L’HARIDON, R. 
(1978). The experimental production of diarrhea in colostrum- 
deprived axenic and gnotoxenic calves with enteropathogenic E. coli, 
rotavirus, coronavirus and in combined infection of rotavirus and E. 
coli. Annales de Recherches Vétérinaires 9, 433-440. 

HounTEr, E., HILL, E., HARDWICK, M., BRown, A., SCHWARTZ, D. E. 
& TizarD, R. (1983). Complete sequence of the Rous sarcoma virus 
env gene: identification of structural and functional regions of its 
product. Journal of Virology 46, 920-936. 

KING, B. & BRIAN, D. A. (1982). Bovine coronavirus structural 
proteins. Journal of Virology 42, 700-707. 

Kozak, M. (1987). At least six nucleotides preceding the AUG 
initiator codon enhance translation in mammalian cells. Journal of 
Molecular Biology 196, 947-950. 

Lal, M. M. C., Baric, R. S., BRAYTON, P. R. & STOHLMAN, S. A. (1984). 
Characterization of leader RNA sequences on the virion and 
mRNAs of mouse hepatitis virus, a cytoplasmic RNA virus. 
Proceedings of the National Academy of Sciences, U.S.A. 81, 3626— 
3630. 

Laporte, J. & BoBULEsco, P. (1981). Polypeptide structure of bovine 
enteric coronavirus: comparison between a wild strain purified from 
feces and a HRT18 cell adapted strain. In Biochemistry and Biology of 
Coronaviruses, pp. 181-184. Edited by V. ter Meulen, S. Siddel & H. 
Wege. New York: Plenum Press. 

Laporte, J., BOBULESCO, P. & Rosst, F. (1980). Une lignée cellulaire 
particuliérement sensible a la réplication du coronavirus entéritique 
bovin: les cellules HRT18. Comptes rendus de l’ Académie des sciences 
290, 623-626. 

Lapps, W., Hocus, B. G. & BRIAN, D. A. (1987). Sequence analysis of 
the bovine coronavirus nucleocapsid and matrix protein genes. 
Virology 157, 47-57. 


LuytsEs, W., STURMAN, L. S., BREDENBEEK, P. J., CHARITE, J., VAN DER 
ZEST, B. A. M., HORZINEK, M. C. & SPAAN, W. J. M. (1987). 
Primary structure of the glycoprotein E2 of coronavirus MHV-A59 
and identification of the trypsin cleavage site. Virology 161, 479-487. 

LuytJgs, W., BREDENBEEK, P. J., NoTEN, A. F. H., HORZINEK, M. C. & 
SpaaNn, W. J. M. (1988). Sequence of mouse hepatitis virus A59 
mRNA2: indications for RNA recombination between coronavir- 
uses and influenza C virus. Virology 166, 415-422. 

MacGINNESs, L. W. & MorRISON, T. G. (1986). Nucleotide sequence of 
the gene encoding the Newcastle disease virus fusion protein and 
comparisons of paramyxovirus fusion protein sequences. Virus 
Research 5, 343-356. 

MANIATIS, T., FRitscH, E. F. & SAMBROOK, J. (1982). Molecular 
Cloning: A Laboratory Manual. New York: Cold Spring Harbor 
Laboratory. 

MEBUS, C. A., STAIR, E. L., RHODES, M. B. & TwiEHaus, M. J. (19732). 
Neonatal calf diarrhea: propagation, attenuation, and characteris- 
tics of a coronavirus-like agent. American Journal of Veterinary 
Research 34, 145-150. 

Mesus, C. A., STaiR, E. L., RHODES, M. B. & TwieHaus, M. J. (19735). 
Pathology of neonatal calf diarrhea induced by a coronavirus-like 
agent. Veterinary Pathology 10, 45-64. 

PaRKER, M. D., Cox, G. J., Derect, D., FITZPATRICK, D. R. & 
BaBIUK, L. A. (1989). Cloning and in vitro expression of the E3 
haemagglutinin glycoprotein of bovine coronavirus. Journal of 
General Virology 70, 155-164. 

QUEEN, C. & Korn, L. J. (1984). A comprehensive sequence analysis 
program for the IBM personal computer. Nucleic Acids Research 12, 
581-599. 

Rao, M. J. K. & Arcos, P. (1986). A conformational preference 
parameter to predict helices in integral membrane proteins. 
Biochimica et biophysica acta 869, 197-214. 

RasscHaerT, D. & LAuDE, H. (1987). The predicted primary structure 
of the peplomer protein E2 of the porcine coronavirus transmissible 
gastroenteritis virus. Journal of General Virology 68, 1883-1890. 

SANGER, F., NICKLEN, S. & Coutson, A. R. (1977). DNA sequencing 
with chain-terminating inhibitors. Proceedings of the National 
Academy of Sciences, U.S.A. 74, 5463-5467. 

SCHMIDT, I., SKINNER, M. & SIDDELL, S. (1987). Nucleotide sequence of 
the gene encoding the surface projection glycoprotein of coronavirus 
MHV-JHM. Journal of General Virology 68, 47-56. 

SIDDELL, S., WEGE, H. & TER MEULEN, V. (1982). The structure and 
replication of coronaviruses. Current Topics in Microbiology and 
immunology 99, 131-163. 

STuRMAN, L. S. & Homes, K. V. (1983). The molecular biology of 
coronaviruses. Advances in Virus Research 28, 35-112. 

VAQUERO, C., SANCEAU, J., CATINOT, L., ANDREU, G., FALCOFF, E. & 
Faccorr, R. (1982). Translation of mRNA from phytohemagglu- 
tinin-stimulated human lymphocytes: characterization of interferon 
mRNAs. Journal of Interferon Research 2, 217-228. 

VAUTHEROT, J. F., LAPORTE, J., MADELAINE, M. F., BOBULESCO, P. & 
RoseTo, A. (1984). Antigenic and polypeptide structure of bovine 
enteritic coronavirus as defined by monoclonal antibodies. In 
Molecular Biology and Pathogenesis of Coronaviruses, pp. 117-132. 
Edited by P. J. M. Rottier, B. A. M. van der Zeijst, W. J. M. Spaan & 
M. C. Horzinek. New York: Plenum Press. 

VON HEINE, G. (1984). How signal sequences maintain cleavage 
specificity. Journal of Molecular Biology 173, 243-251. 

WarTSON, M. E. E. (1984). Compilation of published signal sequences. 
Nucleic Acids Research 12, 5145-5164. 


(Received 29 March 1989; Accepted 19 October 1989) 


